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Abstract 

This paper studies theory and inference related to a class of time series models that in- 
corporates nonlinear dynamics. It is assumed that the observations follow a one-parameter 
exponential family of distributions given an accompanying process that evolves as a function of 
lagged observations. We employ an iterated random function approach and a special coupling 
technique to show that, under suitable conditions on the parameter space, the conditional mean 
process is a geometric moment contracting Markov chain and that the observation process is ab- 
solutely regular with geometrically decaying coefficients. Moreover the asymptotic theory of the 
maximum likelihood estimates of the parameters is established under some mild assumptions. 
These models are applied to two examples; the first is the number of transactions per minute 
of Ericsson stock and the second is related to return times of extreme events of Goldman Sachs 
Group stock. 

Keywords: Absolute regularity; Ergodicity; Geometric moment contraction; Iterated random 
functions; One-parameter exponential family; Time series of counts 

1 Introduction 

With a surge in the range of applications from economics, finance, environmental science, social 
science and epidemiology, there has been renewed interest in developing models for time series 
of counts. The majority of these models assume that the observations follow a Poisson distribu- 
tion conditioned o n an accompanying intensity process that drives the dyna mics of the models, e.g., 
Davis et al. (20031 ) , iFokianos et al. (20091 ) , iNeumann f2"oill ') , IStreett (20odi ] and lPoukhan et al. (201^ . 



According to wheth er the evolu tion of the intensity process depends on the observations or solely on 



an external process. ICox (19811 ) classified the models into observation-driven and parameter-driven. 
This paper focuses on the theory and inference for a particular class of observation-driven models. 

Many of the commonly used models, such as the Poisson integer- valued GARCH (INGARCH), 
are special cases of our model. For an INGARCH, the observations {Yt} given the intensity process 
{Xt} follow a Poisson distribution and is a linear combination of its lagged values and lagged 
Yt. The model is capable of capturing positive temporal correlation , in the observations and it is 



relatively easy to fit via maximum likelihood. iFerland et al. (20061) showed the seco nd moment 



stationarity through a sequence of approximating processes and iFokianos et al. f2009l i established 
the consistency and asymptotic normality of the MLE by introducing a perturbed model. However, 
all the above results rely heavily on the Poisson assumption and the GARCH- like dynamics of Xt- 
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Later Neumann (201 ll ) relaxed the linear assumption to a gene ral contracting evolut ion rule and 



proved the absolute regularity for this Poisson count process and lDoukhan et al. (20121 ) showed the 
existence of moments under similar conditions by utilizing the concept of weak dependence. 

In our study the conditional distribution of the observation given the past is assumed to fol- 
low a one-parameter exponential family. The temporal dependence in the model is defined through 
recursions relating the conditional mean process Xt with its lagged values and lagged obs erva- 



tions. Theory from iterated random functions (IRF), see e.g.. iDiaconis and Freedman fl999l ^ and 



Wu and Shao r2004l ). is utilized to establish some key stability properties, such as existence of a sta- 



tionary and mixing solution. This theory allows us to consider both linear and nonlinear dynamic 
models as well as inference questions. In particular, the asymptotic normality of the maximum 
likelihood estimates can be established. The nonlinear dynamic models are also investigated in a 
simulation study and both linear and nonlinear models are applied to two real datasets. 

The organization of the paper is as follows. Section 2 formulates the model and establishes 
stability properties. The maximum likelihood estimates of the parameters and the relevant asymp- 
totic theory are derived in Section 3. Examples of both linear and nonlinear dynamic models are 
considered in Section 4. Numerical results, including a simulation study and two data applications 
are given in Section 5, where the models are applied to the number of transactions per minute of 
Ericsson stock and to the return times of extreme events of Goldman Sachs Group (GS) stock. 
Some diagnostic tools for assessing and comparing model performance are also given in Section 5. 
Appendix A reviews some standard properties of the one-parameter exponential family and the 
proofs of the key results in Sections 2-4 are deferred to Appendix B. 

2 Model formulation and stability properties 
2.1 One-parameter exponential family 

A random variable Y is said to follow a distribution of the one-parameter exponential family if its 
probability density function with respect to some cr-finite measure ji is given by 

p{y\r]) = exp{?7?/ - A{r])]h(y), y > 0, (2.1) 

where rj is the natural parameter, and A{rj) and h{y) are known functions. If B(rj) = A'{ri), then 
it is known that El" = 8(7]) and Var(y) = B'(n). The d erivative of A{r]) exists generally for the 



exponential family, see e.g., iLehmann and Casella (19981 ). Since B'[rf) = Var(y) > 0, so B[r]) is 



strictly increasing, which establishes a one-to-one association between the values of rj and B{rj). 
Moreover, because we assume that the support of Y is non-negative throughout this paper, so 
B{rj) = Ey > 0, which implies that A['q) is strictly increasing. Other properties of this family of 
distributions are presented in Appendix A. 

Many familiar distributions belong to this family, including Poisson, negative binomial, Bernoulli, 
exponential, etc. If the shape parameter is fixed, then the gamma distribution is also a member 
of this family. While we restrict consideration to only the univariate case, extensions to the multi- 
parameter exponential family is a topic of future research. 

2.2 Model formulation 

Set J-Q = cr{r7i}, where 771 is a natural parameter of (j2.ip and assumed fixed for the moment. Let 
Yi,Y2,... be observations from a model that is defined recursively in the following fashion, 

Yt\Ft-i^p{y\m), Xt=ge{Xt^i,Yt^i), (2.2) 
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for all t > 1, where p{y\r]t) is defined in ()2.ip . Tt = a{7]i,Yi, . . . ,Yt} and Xt is the conditional 
mean process, i.e., Xt = B{r]t) = Fi{Yt\J-'t-i) ■ Here ge{x,y) is a non-negative bivariate function 
defined on [0,oo) x [0, oo) when Yt has a continuous conditional distribution or on [0,oo) x No, 
where No = {0, 1, . . .}, when Yt only takes non-negative integers. Throughout, we assume that the 
function gg satisfies a contraction condition, i.e., for any x,x' > 0, and y,y' G [0, oo) or No, 

\9eix,y) - ge{x',y')\ < a\x - x'\ +b\y-y'\, (2.3) 

where a and b are non- negative constants with a + b < 1 . Note that (|2.3p implies 

ge{x,y) < ge{0,0) + ax + by, foranyx,y>0. (2.4) 



We point out that model (j2.2p with the function g^ satisfying (j2.3p includes the Poisson INGARCH 
model (see Example [T]) and the exponential autoregressive model (|4.14p as special cases under some 
restrictions on t he parameter space . The generalized linear autoregressive moving average model 



(GLARMA) (see lDavis et al. (20031 )) also belongs to this class, although the contraction condition 
is not necessarily satisfied. Only under very simple model specifications have the stability properties 
of GLARMA been established and the relevant work is still ongoing. The primary focus of this 
paper is on the conditional mean process {Xt}, which can be easily seen as a time-homogeneous 
Markov chain. Note that the observation process {Yt} is not a Markov chain itself. 

2.3 Strict stationarity 

The iterated random function approach (see e.g., Diaconis and Freedman (19991 ) and Wu and Shao (2004 )) 



provides a useful tool when investigating the stability properties of Markov chains and turns out to 
be particularly instrumental in our research. In the definition of iterated random functions (IRF), 
the state space (W, p) is assumed to be a complete and separable metric space. Then a sequence 
of iterated random functions {fot} is defined through 

Wt = feAWt-i), tGN, 

where {6t}t>i take values in another measurable space and are independently distributed with 
identical marginal distribution, and VFq is inde pendent of \9t\t>^ ■ 



In working with iterated random functions, IWu and Shao f2004l l introduces the idea of geomet- 



ric moment contraction (GMC), which is useful for deriving further properties of IRF. Our research 
is also relying heavily on GMC. Suppose there exists a stationary solution to the Markov chain 
{Wt}, denoted by vo, let IVojI^o ~ be independent of each other and of {6t}t>i-, and define 
Wt{w) = fe^ o fet_^ o . . . o fg^[w). Then {Wt} is said to be geometric moment contracting if there 
exist an a > 0, a C = C{a) > and an r = r{a) G (0, 1) such that, for all i G N, 

E{p''{Wn{Wo),Wn{W!,m < Cr^. 

The conditional mean process {Xt} specified in (j2.2p can be embedded into the framework of IRF 
and shown to be GMC. 

In this section and the next we use g to represent the function gg in (12. 2p evaluated at the true 
parameter. For any u G (0, 1), the random function fu{x) is defined as 

Ux):=g[x,F-\u)), (2.5) 

where is the cumulative distribution function oi p{y\r]) in (|2.ip with x = B{i]), and its inverse 
F~^{u) := inf{t > : Fx{t) > u} for u G [0,1]. Let {Ut} be a sequence of independent and 
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identically distributed (iid) uniform (0, 1) random variables, then the Markov chain {Xf} defined 
in (12. 2p starting from Xq = x can be represented as the so-called forward process Xt{x) = (fu^ o 
fut-i°- ■ ■°fui)ix)- The corresponding backward process is defined as Zt{x) = {fui°fu2°- ■ ■°fut){^)i 
which has the same distribution as for any t. 

Proposition 1. Assume model (j2.2p and that the function g satisfies the contraction condition 
(fzaj) . Then 

1. There exists a random variable Z^o such that, for all x £ S, Zn{x) — t- Z^o almost surely. The 
limit Zoo does not depend on x and has distribution vr, which is the stationary distribution 
of {Xt}. 

2. The Markov chain {Xt, t > 1} is geometric moment contracting with vr as its unique stationary 
distribution. In addition, J^nXi < oo. 

3. If {Xt,t > 1} starts from vr, i.e., Xi ~ vr, then {Yt,t > 1} is a stationary time series. 

Proposition [T] implies that starting from any state x, the limiting distribution of the Markov 
chain Xn{x) exists and the n-step transition probability measure P^{x, ■) converges weakly to vr, 
as n — )• oo. 

2.4 Ergodicity 

In this section we further investigate the stability properties, including ergodicity and mixing for 
model ()2.2p . Under the conditions of Proposition [1] the process {{Xt,Yt)} is strictly stationary, so 
we can extend it to be indexed by all the integers. The following proposition establishes ergodicity 
and absolute regularity when Yf is discrete. 

Proposition 2. Assume model (j2.2p where the support of Yj is a subset of No = {0, 1, . . . , }, 
and that g satisfies the contraction condition (12. 3p . Then 

1. There exists a measurable function g^o ■ = {(ni, n2, . . .), G No, i = 1, 2, . . .} — > [0, oo) 
such that Xt = goo{Yt-i, lj_2, • • •) almost surely. 

2. The count process {Yt} is absolutely regular with coefficients satisfying 

/3(n)<(a + 6)7(l-(a + 6)), 
and hence {{Xt,Yt)} is ergodic. 

When Yt has a continuous distribution, geometric ergodicity of {Xt} can be established under 
stronger conditions on g. The proof of the result relies on the classic Markov chain theory since 
{Xt} is 0-irreducible due to the continuity of the distribution in this situation. 

Proposition 3. Assume model (j2.2p where the support of Yt is [0, oo), and that the function g 
satisfies the contraction condition (j2.3p . Moreover if g is increasing and continuous in (x,y), then 

1. There exists ^oo : [0, oo)°° — [0, oo) such that Xt = goo(Yt^i,Yt^2-, ■ ■ •) almost surely. 

2. The Markov chain {Xt,t > 1} is geometrically ergodic provided that a + b < 1, and hence 
{{Xt,Yt)} is stationary and ergodic. 
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3 Likelihood Inference 



In this section, we consider maximum likelihood estimates of the parameters and study their asymp- 
totic behavior, including consistency and asymptotic normality. Denote the d— dimensional param- 
eter vector by G M*^, i.e., 9 = {di, . . . , da)'^, and the true parameter vector by = (6*1) ■ ■ ■ , ^S)^- 
Then the likelihood function of model (j2.2p conditioned on iji and based on the observations 
Yi,...,Yn is given by 

n 

L{e\Y,, ...,Yn,m) = ll ^Mvt{0)Yt - A{l^t{e))}h{Yt), 
t=i 

where rjtiO) = B^^{Xt{6)) is updated through the iterations Xt = go{Xt-i,Yt-i). The log- 
likelihood function, up to a constant independent of 0, is given by 



t=i t=i 

with score function 

sM = '-^ = t{y.-B(,mf-^. (3.2) 

t=i 

The maximum likelihood estimator On is a solution to the equation Sn{d) = 0. Let Pg^ be the 
probability measure under the true parameter a-nd unless otherwise indicated, E[-] is taken 
under 0q. Recall that Xt = g^{Yt-i,Yt-2, ■ ■ ■) according to part (a) of Propositions [2] and [3l We 
will derive the asymptotic properties of the maximum likelihood estimator 0„ based on a set of 
regularity conditions: 

(AO) 9o is an interior point in the compact parameter space @ G M'^. 

(Al) For any 9 £ Q, > x*q £ TZ{B), where TZ{B) is the range of B{r]). Moreover > x* e TZ{B) 
for all 9. 

(A2) For any y G [0,oo)°° or Ng°, the mapping i— )• g^{y) is continuous. 

(A3) g{x,y) is increasing in {x,y) if Yt given J-t-i has a continuous distribution. 

(A4) E{Yi supeee B-^gLiYo, Y.i, . . .))} < oo. 

(A5) If there exists a t > 1 such that Xt{9) = Xt{9Q), Pg^^-a.s., then 9 = 9q. 
(A6) The mapping 9 i— t- g^^ is twice continuously differentiable. 
(A7) 'E{B'{r^i{9o)){drii{9)/d9i)\=e,} < oo, for i = l,...,d. 

Strong consiste ncy of the estirn ates is derived according to the lemma below, which is adapted from 
Lemma 3.11 in iPfanzagl 

Lemma 1. Assume that C M'^ is a compact set, and that ($7, P) is a probability space. 
Let {fo : i— > [—oo, oo], 9 G &} be a family of Borel measurable functions such that: 

1. ^ I— 7- /^(x) is upper-semicontinuous for all x G M°°. 
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2. supg^Q fg{x) is Borel measurable for any compact set C C 0. 

3. E{supgg@ fg{X)} < oo for some random variable X defined on {Q,J^,P). 
Then 

1. 6 ^ E[fg{X)] is upper-semicontinuous. 

2. If {Xf : Q I— 7- M°°,t G Z} is an ergodic stationary process defined on (17, J^, P), and for all t, 
Xt has the same distribution as X, then 

1 " 

limsupsup-\^/£)(Xi) < supE{/e(Xi)}, a.s.-P, 
for any compact set C. 

Pfanzagl f 19691 ) proved the result assuming the independent structure of {Xf}, but the same result 
proves to be true provided that the strong law of large numbers can be applied. By virtue of Lemma 
[U we can derive the strong consistency of the estimates. 

Theorem 1. Assume model (12. 2p with the function g satisfying the contraction condition ()2.3p . 
and that assumptions (A0)-(A5) hold. Then the maximum likelihood estimator On is strongly 
consistent, that is, 

On — -> 00, as n — )■ oo. 



The following th eorem addresses th e asymptotic distribution of the MLE and the idea of proof 



is similar to that in lDavis et al. (20031 ). Unless otherwise indicated, rjt and fjt are both evaluated 
at 6*0, i.e., r/t = r/t(6'o) and fit = {d'nt/d9)\g=g^. 

Theorem 2. Assume model (12. 2p with the function g satisfying the contraction condition ()2.3p , 
and that assumptions (A0)-(A7) hold. Then the maximum likelihood estimator On is asymptotically 
normal, i.e., 

Vn{On-Oo) ^ N{0,n-'^), as n^oo, 
where Q = Fi{B'{rit)f]tr]'[}. 

We remark that in practice, the population quantities in Q can be replaced by their estimated 
counterparts. Examples of such substitution will be illustrated below in specific models. 

4 Examples 

4.1 Linear dynamic models 

The conditional mean process {Xt} in these models has GARCH-like dynamics. Specifically they 
are described as 

YtlTt^i ~ p{y\r]t), Xt = 5 + aXt^i + pYt~i, (4.1) 
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where Xt = B{r]t) = E(Yt\Tt-i), and 6 > 0,a, (3 > are parameters. Observe that model (|4.ip is 
a special case of model ()2.2p by defining the function gg as 

9e{x,y) = S + ax + /3y, (4.2) 

with 6 = {6, a, (3)'^ and the contraction condition (j2.3p corresponds to a + /3 < 1. Note that by 
recursion we have, for all t, 

oo 

Xtie) = 6/{l-a) + pY.a''Yt_i_k- (4.3) 

fc=0 

It follows that Xt{9) > x* = 5/{l — a) since Yt only takes non- negative values. A direct application 
of Propositions [H [2] and [3] gives the stability properties of model (j4.ip . 

Proposition 4- Assume model (14.11) with a + P < 1. Then the process {Xt,t > 1} has a unique 
stationary distribution tt, and {(Xt,l^),t > 1} is ergodic if Xi ~ vr. 

If ^0 = (i^Oi ctQ, /3o)"^ denotes the true parameter vector, then the log-likelihood function l{9) and 
the score function Sn{0) of model (j4.ip are given by (j3.ip and ()3.2p respectively, where drjt{9)/d9 = 
{dr]t/d5,dr]t/da,d'r]t/d(3)'^ is determined recursively by 
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The maximum likelihood estimator On is a solution of the equation Sn{0) = 0. Furthermore, the 
Hessian matrix can be found by taking derivatives of the score function, i.e., 

t=i 



where 



d\ f B"{r,t) diit B'{i^t-i)B'{r^t)diit-i B'{Tjt_,)B"{r,t) drh 



dOdQT \{B'{r]t))^ de {B'{i]t)Y 86 {B'{'nt)Y 89 

^Yt^,B"{7^t)8in\ , ,,T B'ivt-i) 8vt-i , B"ir,t-i)B'M 



{B'iVtW 80 J ' ' ' B'{7u) 80T (B'ivt)) 

dr)t-idvt-i B'{'nt-i)B"{'nt)8iit8vt , B'{Tjt-i)8\- 
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86 8eT {B'{r]t)y 80 86^ B'{r]t) 8686^' 

It follows from the representation with the infinite past (|4.3p that assumptions (A1)-(A3) and (A6) 
are satisfied. In order to apply Theorem [2] when investigating the asymptotic behavior of the MLE, 
we need to impose the following regularity conditions: 

(LO) The true parameter vector 9q lies in a compact neighborhood Q G of where @ = {9 = 
{5,a, 13)'^ £Rl : < 5l < S < 5u,e < a + j3 < 1 - e} for some e > 0. 

(LI) E{yisupeee^'HV(l-«)+/3Er=o«'^-fc)} <oo. 

(L2) E{B'imi9o)){dm{9)/d9^f\e=eo} < oo, for i = 1,2,3. 
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Theorem 3. Assume model (j4.ip and that assumptions (L0)-(L2) hold. Then the maximum 
likelihood estimator 6n is strongly consistent and asymptotically normal, i.e., 



as n — )• oo, 
No- 



where Q. = E{B'{7]t)r]triT}^ where rjt = ??j(6'o) and Vt = ^ 



Remark 1. Under the contraction condition a + (3 < 1, {l^} can be represented as a causal 
ARMA(1,1) process. To see this, denote dt = Yt — Xt, then it follows from E((it|J-t_i) = that 
{dt,t G Z} is a martingale difference sequence. Therefore model (j4.ip can be written as 

Yt-{a + f3)Yt_i = 5 + dt-adt_i. (4.5) 

Denote 7y(/i) as the auto- covariance function of jlfj. If 7y (0) < oo, then7y(/i) = (a+/3)'*~^7y (1), 
for h>l, see for example iBrockwell and Davis (199ll ). 

In practice, it can be difficult to verify assumptions (LI) and (L2), so we provide some alternative 
sufficient conditions for them in the following two remarks. 

Remark 2. A sufficient condition for assumption (LI) is 

oo 

F.{YiB-\5u/e + Y,{1 - efYi^k)] < oo, 

k=l 

provided that 6u/e + Ylh=ii^ ~ ^)^^i-fe is in the range of B{r]). This can be seen by noting that 
Xi{6)<5u/e + ET=ii^-e)'Y,^k. 

Remark 3. If A"{r]t) > c for some c > 0, this is true, for example, when A"{rj) is increasing 
and A"{B^^{6l)) > 0, then a sufficient condition for assumption (L2) is 7y(0) < oo. 

Next we consider some specific models belonging to class (j4.ip . most of which are geared towards 
modeling time series of counts. 

Example 1. As a special case of the linear dynamic model (|4.ip with rjt = log At and A{rit) = e''* , 
the Poisson INGARCH(1, 1) model is given by 

yt|J-t_i ~Pois(At), At = (5 + aAt_i+/3Ft_i, (4.6) 

where (5 > 0, a, /3 > are parameters. According to PropositionSl it is easy to see that if a + /3 < 1, 
then {Aj} is geometric moment contracting and has a unique stationary distribution vr; moreover 
if Ai ~ vr, then {{Yt,\t),t > 1} is an ergodic stationary process. As for inference, the MLE 9n 

is strongly consistent and asymptotically normal according to Theorem [3l i.e., y/n{6n — Oq) — > 
N{0, 0.^^), as n — )• oo, where Q = E| 1 / At (dXt I dO) (dXt /dO)'^}. To see this, we only need to verify 
assumptions (LI) and (L2). Note that by Fokianos et al. (20091 ). we have 7y(0) = {1 - (a + pf + 
- (a + pf] and 7y(/i) = /iC(0)(a + pf-^ for h>l, where = EYt = 5/{l - a - P) and 
C{9) is a positive constant dependent on 9. Hence by monotone convergence theorem, we have 

oo oo 

E[yi \og{5u/e + Y,{1 - efY^^k]] < mA^u/e + ^^(1 - e)'=n-fe}] 

k=l k=l 

= ^EYi + ^{l-e)''EYiYi^k 

k=l 

oo 



k=l 
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Hence assumption (LI) holds according to Remark[2l Notice that B{rjt) = > A* := S/{1 — a) for 
ah t, so A"{rjt) = e''* is bounded away from 0, so assumption (L2) holds according to Remark [3l 

Moreover, the iterated random function approach can be used to study the properties of IN- 
GARCH models with higher orders. A Poisson INGARCH(p, q) model takes the form 

p g 

Yt\J^t_i ~ Pois(At), Xt = 6 + ^aiAi_i + (4.7) 

i=i j=i 

where S > 0, Ui, I3j > 0,i = 1, . . . ,p; j = I, . . . , q. Applying similar ideas as in the INGARCH(1, 1) 
case, we have the following stationarity result. 

Proposition 5. Consider the INGARCH(p, q) model (j4.7p and suppose Yl\=i + Sj=i f^i ^ 
then {A^} is geometric moment contracting and has a unique stationary distribution. 

Example 2. The negative binomial INGARCH(1, 1) model (NB-INGARCH) is defined as 

Yt\Pt^i^^B{r,pt), Xt = 5 + aXt-i + pYt-i, (4.8) 

where Xt = r(l — pt)/pt^ ^ > 0, a, /? > are parameters and the notation Y ~ NB(r,p) represents 
the negative binomial distribution with probability mass function given by 



P{Y = k)=^ J(l-p)V, fc = 0,l,2,.... 

When r = 1, the conditional distribution of Yt becomes geometric distribution with probability of 
success pt, in which case (j4.8p reduces to a geometric INGARCH model. 

By virtue of Proposition HI if a + /3 < 1, then {Xt,t > 1} is a geometric moment contracting 
Markov chain, and has a unique stationary distribution vr; and when Xi ~ vr, {{Xt,Yt),t > 1} 
is ergodic. As for inference, we can first estimate 9 = {6, a, /3)^ for r fixed and calculate the 
profile likelihood as a function of r. Then r is estimated by choosing the one which maximizes the 
profile likelihood, and thus 9 can be otained correspondingly. Moreover, if we assume r is known 
and (a + /3)^ + < 1, then under assumption (LO), the maximum likelihood estimator 9n is 
strongly consistent and asymptotically normal with mean ^^nd covariance matrix /n, where 
n = E{r/Xt/iXt + r){dXt/d9){dXt/d9f}. Verification of assumptions (LI) and (L2) is sufficient 
to demonstrate the result. Since B~^{x) = log{x/(x + r)} < 0, so assumption (LI) holds according 
to Remark[2j Note that A"{rit) = re''*/(l — e''*)^ is increasing, so assumption (L2) holds provided 
7y(0) < oo according to Remark[3l Because Var(Xi) = a^Var(Xo) + /3^Var(yo) + 2a/3Cov(Xo, Iq)) 
where 

Var(yo) = E{Var(yo|^o)} + Var{E(yo|^o)} 

= E{r(l - po)/pI} + Var(Xo) = /i + l/rEX^ + Var(Xo), 

and Cov(Xi,yi) = Eyi^i — = EXf — fj? = Var(Xi), it follows from the stationarity that 

^^'^""'^ = l-(a + /3)2-/3Vr- 
Hence 7y(0) < oo provided (a + /3)^ + < 1. 

Example 3. We define the binomial INGARCH(1, 1) model as 

Yt\Tt-i ~ B{m,pt), mpt = 6 + ampt-i + /3Yt-i, (4.9) 
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where 6 > 0,a, /3 > are parameters and 6 + am + /3m < m since pt G (0, 1). This imphes the 
contraction condition a+f3 < 1. In particular, when m = 1, it models time series of binary data, and 
is called a Bernouhi INGARCH model. If (5 + am + /3m < m, then {Xt = mpt, t > 1} is geometric 
moment contracting and has a unique stationary distribution vr; furthermore, {(Xt,Yt),t > 1} is 
ergodic when Xi ~ vr. 

We now consider the inference of the model. Firstly, because of the special constraint pt G (0, 1), 
the parameter space becomes 

e = {(5, a,f3f :0<5L<S<5u,e<a + /3<l-e} for some e > 5u/m. 

Since yj < m, so Xi{9) < (5+am)/(l-Q) and -6-^X1(0)) < log{{6u + {l-e)m)/{em-6u)}. Hence 
assumption (LI) holds. Notice that A"{r]t) = mpt{l — Pt) and pt G [6u/m, {6 + f3m)/{m{l — a))] C 
[0, 1], so A"{r]t) is bounded away from 0. Similar to the proof in Example [2l one can show that 
7y(0) < 00 provided that (a + /3)^ + /3^/m < 1. So assuming m is known and (a + /3)^ + f5'^/m < 1, 
the maximum likelihood estimator 9n is strongly consistent and asymptotically normal with mean 
60 and covariance matrix Q-'^/n, where Q = E{m/Xt/{m - Xt){dXt/de){dXt/deY}. 

Example 4- The gamma INGARCH model, which has a continuous response, is given by 

Yt\Tt^i^T{K,st), St = 5 / 1^ + ast-i + 13 / KYt-i, (4.10) 

where k and st are the shape and scale parameters of the gamma distribution respectively and 
(5 > 0,a,/3 > are parameters. Here the natural parameter is rji = —1/st and the Markov chain 
Xt = B{r]t) = —K./r]t. If a + /3 < 1, then {Xt = KSt,t > 1} is geometric moment contracting and has 
a unique stationary distribution vr; furthermore, {{Yt, Xt),t > 1} is an ergodic stationary process 
if Xi ~ IT. 

As for the inference in this model, assume k is known and {a + /3)^ + /3^/k < 1. Then the 
maximum likelihood estimator 0„ is strongly consistent and asymptotically normal with mean 
00 and covariance matrix Q,~'^/n where Q. = E{K/sf{dst/d6){dst/d6)'^}. To see this, note that 
B~^{x) = —k/x < when x > 0, which verifies assumption (LI) according to Remark [2l Similar 
to the proof in Example [21 one can show that 7y(0) = (1/k + l)7x(0) + /x^/k and 7x(0) = 
(/3^^^/k)/{1 - (a + /3)^ - /3^/k}. Hence as long as (a + /3)^ + /3^/k < 1, we have 7y(0) < 00. Since 
A"{r]t) = i^/^t — '^i/^ ^ assumption (L2) holds according to Remark[3l 



4.2 Nonlinear dynamic models 

It is possible to generalize (|4.ip to nonlinea r dynamic models. O ne approach is based on the idea 
of spline basis functions, see for example, iRuppert et al. In this framework, the model 



specification is given by 

K 



Yt\Ft^i^p{y\m), Xt = 5 + aXt^i+PYt^i + Y.l^k{Yt^^-ik)^: (4.11) 



fc=i 



where K G No, 5 > Q,a, fi > 0, Pi, . . . , /3k are parameters, {^k}k=i the so-called knots, and 
is the positive part of x. In particular, when K = 0, (|4.1ip reduces to the linear model 
()4.ip . It is easy to see that model (14. IIP is a special case of model ()2.2p by defining gg{x,y) = 
5 + ax + I3y + '}2ik=i Pk{y — ^k)^ ■, where = (5, q, /3, /3i, . . . , f3K)'^- Note that in each of the pieces 
segmented by the knots, ()4.1ip has INGARCH-like dynamics. For example, if It-i G [.^s,^s+i) 
for some s < K, then Xt = {5 — T^.l-i 3kS.k) + Q^t-i + ( (3 + X^fc=i f3k)Yt-i. This can be viewed 



as one of the generalizations (e.g., Samia and Chan (20ld )) to the threshold autoregressive model 
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(jTong (199d )). According to Propositions [H [2]and[3l we can establish the stabihty properties of 
the model. 



Proposition 6. Consider model (|4.1ip with parameters satisfying a + /3 < 1, /9 + X]fc=i /^fc > 
and a + /3 + /3fc < 1 for s = 1, . . . ,K, then {Xt} is geometric moment contracting and has a 

unique stationary distribution vr. Moreover if Xi ~ vr, then {{Xt,Yt),t > 1} is ergodic. 

We now consider inference for this model. Assume the knots {S,k}k=i known for K fixed. 
Then the parameter vector 6 = {6,a, (3, . . . , (^k)'^ can be estimated by maximizing the condi- 
tional log-likelihood function, which is available according to (j3.ip . The number of knots K can be 
selected by virtue of an information criteria, such as AIC and BIC. As for the locations of knots, 
there are different strategies one can adopt for choosing them. One method is to place the knots at 
the {j/{K + 1), j = 1, . . . ,K} quantiles of the population, which can be estimated from the data. 
A second method is to choose the locations that maximize the log likelihood. We will employ both 
procedures to real datasets in the next section. 

To study the asymptotic behavior of the estimates, first note that by iterating the recursion, 

oo K oo 

i=0 k=l i=0 

oo K 

= + + (4.12) 

i=0 k=l 

This defines the function as in Xt = g^{Yt-i,Yt-2-, ■ ■ ■) and also verifies assumptions (Al)-(A3). 
Hence in order to apply Theorem [3l we only need to impose the following regularity assumptions 
for the nonlinear model (|4.1ip : 

(NLl) ^0 is an interior point in the parameter space 0, which is a compact subset of the parameter 
set satisfying the conditions in Proposition [6j 

oo K 

(NLl) F.[YlSViY>B-\{5/{l-a) + y^a'{!3Yt^l^^ + y^f3k{Yt-l-^-^k)^])]<^■ 

(NL2) ^[B'{i^M){dr,i{e)/dei)Y\e=e^ < oo, for i = 1, . . . , + 3. 

Sufficient conditions for assumptions (NLl) and (NL2) can be established similarly to those given in 
Remarks [2] and [3l The asymptotic properties of the MLE are summarized in the following theorem. 

Theorem 4- For model (|4.1ip . suppose that the placement of the knots is known, and that 
assumptions (NL0)-(NL2) hold, then the maximum likelihood estimator 9n is strongly consistent 
and asymptotically normal, i.e., 

as n ^ oo, 

where O. = Fi{B' {rit)'r]t'nT} ■ 

We use the Poisson nonlinear dynamic model as an illustrative example of the above results and 
refer readers to Section 5 for implementation of the estimation procedure. The model is defined as 

K 

Yt\J^t^i ~ Pois(AO, Xt = S + aXt^i + m~i + J^/3fc(lt-i - 6)+. (4.13) 

k=l 
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It follows that under the conditions of Proposition [6] and Theorem [5] that {{Xt,Yt),t > 1} is 
a stationary and ergodic process, and the estimates are strongly consistent and asymptotically 
normal. In practice the covariance matrix of the estimates can be obtained by recursively applying 

Another exa mple of nonlinear dy namic models is the Poisson exponential autoregressive model 
proposed by iFokianos et al. and it is given by 



y*! ~ Pois(At), At = (ao + aiexp{-7A2_J)At_i +/3yt„i, (4.14) 

where aQ,ai,l3,^ > are parameters. We point out that if ao + Q?i + /? < 1, then model (j4.14|) 
belongs to the class of models (j2.2p and hence enjoys the stabil ity properties stated i n Propositions 
[1] and [21 As for the inference of the model, we refer readers to iFokianos et al. for details. 



5 Numerical results 

The performance of the estimation procedure for the Poisson nonlinear dynamic model is illus- 
trated in a simulation study. The MLE is obtained by optimizing the log-likelihood function (13. ip 
using a Newton-Raph son method. Simulation results of the Poisson INGARCH can be found in 
Fokianos et al. Other models including the negative binomial linear and nonlinear dynamic 

models and the exponential autoregressive model (j4.14p will be applied to two real datasets, and 
tools for checking goodness of fit will be considered. 



5.1 Simulation for the nonlinear model 

As specified in (j4.13p . a 1-knot nonlinear dynamic model is simulated according to 

Yt\J^t^i ~ Pois(At), Xt = 0.5 + 0.5At_i + 0.4Yt^i - 0.2{Yt^i - 5)+ 

with different sample sizes. Each sample size and parameter configuration is replicated 1000 times. 
For each realization, the first 500 simulated observations are discarded as burn-in in order to let the 
process reach its stationary regime. We first estimate the parameters assuming that the location of 
the knot is known, i.e., the true underlying model is ()4.1ip with only one knot at 5. The means and 
standard errors of the estimates from all 1000 runs are summarized in Table [1] and the histograms 
of the estimates are depicted in Figure [H The performance of these estimates is reasonably good 
and consistent with the theory described in Theorem HI As for estimating the parameters without 
knowing the location of the knots, the corresponding results of the MLE obtained by fitting a 1- 
knot model to all the 1000 replications are summarized in Table [2j Here the locations of the knots 
are determined by sample quantiles. Not surprisingly, the performance of the maximum likelihood 
estimates of /? and /3i is not as good as in the known knot case. However, the overall model 
performance, as reflected in the computation of the scoring rules (described in the next section), 
is competitive with the known knot case. For instance when n = 1000, the means of ranked 
probability scores (RPS) for known and unknown knot cases are 1.0906 and 1.0914, respectively. 

Next we turn to the problem of selecting the number of knots using an information criterion. 
Simulations with different sample sizes are implemented and the model selection results are sum- 
marized in Table [3l Numbers in the table stand for the proportion of times that each particular 
model is selected in the 1000 runs. For AIC, the 1-knot model is selected most often followed by a 
2-knot model, at least in the cases when n = 1000. 
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Table 1: Estimation results for 1-knot model with known knot location 





5 


a 


/3 


/3i 


n 


True 


0.5 


0.5 


0.4 


-0.2 




Estimates 


0.5596 


0.4861 


0.3990 


-0.2009 


500 


s.e. 


(0.0087) 


(0.0030) 


(0.0026) 


(0.0051) 




Estimates 


0.5265 


0.4944 


0.3991 


-0.2016 


1000 


s.e. 


(0.0041) 


(0.0016) 


(0.0013) 


(0.0025) 





Table 2: Estimation for 1-knot model with unknown knot location 





S 


a 


/3 


/3i 


n 


True 


0.5 


0.5 


0.4 


-0.2 




Estimates 


0.5387 


0.4852 


0.4187 


-0.1614 


500 


s.e. 


(0.0089) 


(0.0030) 


(0.0031) 


(0.0047) 




Estimates 


0.5002 


0.4943 


0.4197 


-0.1679 


1000 


s.e. 


(0.0042) 


(0.0016) 


(0.0015) 


(0.0023) 






.t; CD 

03 



histogram of a 



0.35 



0.45 



0.55 



0.65 



histogram of p 



histogram of (31 



Q 
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-0.4 -0.3 -0.2 -0.1 



P1 



0.0 



Figure 1: Histograms of the 1-knot model with sample size 1000 assuming the knot is known. The 
overlaying curves are the density estimates and the dashed vertical lines represent the true values 
of the parameters. 
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Table 3: Model selection of 1-knot simulation 



Criteria 


knot 1 knot 2 knots 3 knots > 4 knots n 


AIC 
BIC 


34.3% 37.6% 20.9% 5.2% 2.0% 500 
80.5% 18.8% 0.6% 0.1% 


AIC 
BIC 


12.4% 45.0% 29.9% 8.3% 4.4% 1000 
59.4% 38.4% 2.0% 0.2% 



Table 4: Model selection results for Ericsson data 





0-knot 


1-knot 


2-knot 


3- knot 


4- knot 


5-knot 


LogL 


-1433.19 


-1431.21 


-1431.08 


-1430.58 


-1429.65 


-1431.12 


AIC 


2874.38 


2872.41 


2874.17 


2875.17 


2875.30 


2880.25 


BIC 


2890.90 


2893.07 


2898.95 


2904.08 


2908.35 


2917.43 



In light of the idea of interpolating the nonlinear dynamic of Aj by a piecewise linear function, 
we plot in Figure [2] the fitted functions + Ylk=i ^k{y — ik)~^ for each run of the simulations 
against its true form 0.4y — 0.2(y — 5)"^. From the graph, we can see that the piecewise linear 
function fitted by the 1-knot model is closest to the true curve. 

5.2 Two data applications 

1. Number of transactions of Ericsson stock 

As an illustrative example, both linear and nonlinear dynamic models are employed to fit the 
number of transactions per minute for the stock Ericsson B during July 2nd, 2002 which consists of 
460 observations. Figure[3]plots the data and the autocorrelation function. The positive dependence 
displayed in the data suggests the application of the models in our study. 

By computing the MLE of the parameters, the fitted Poisson INGARCH model is given by 

Ai = 0.2912 + O.8312V1 +0.1395yj_i, 
(0.1000) (0.0242) (0.0188) 

and the fitted NB-INGARCH model is 

yt|J-t_i ~NB(8,pt), Xt = 0.2676 + 0.8447Xt_i +0.1282yt_i, 

(0.1406) (0.0350) (0.0274) 

where Xt = 8(1 —pt)/pt- The standard deviations in the parentheses are calculated according to 
the remark after Theorem [2l 

As for the Poisson nonlinear dynamic model, AIC and BIC are used to help select the number 
of knots among to 5; the values are reported in Table HI The fitted 1-knot Poisson model, which 
has the smallest AIC, is given by 

\^ = 0.5837 + 0.8319Vi + 0.0906yt_i + 0.0722(yf_i - 9)+. 
(0.1884) (0.0241) (0.0295) (0.0373) 

Note that the AIC values of the 2-knot and 3-knot models are both close to that of the 1-knot model, 
and therefore are used as a basis for comparison with the minimum AIC model. These models 
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5 10 15 20 5 10 15 20 



Y Y 

Figure 2: Left: the black curve is the true function OAy — 0.2{y — 5)"^, and the other curves are 
the piecewise linear functions fitted in each simulation where the number of knots K is selected via 
AIC; Right: for each value of K, we plot the fitted curve from one specific run that chooses the 
particular number of knots. 
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stock Ericsson B July 2nd 2002 




Time 
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Lag 



Figure 3: Top: Number of transactions per minute of the stock Ericsson B during July 2nd 2002; 
Bottom: ACF of the data. 



are given by At = 0.5519 + 0.8326At_i + 0.096iyt_i + 0.0154(yt„i - 7)+ + 0.0559(yj_i - 11)+ and 
At = 0.3614 + 0.8361At_i +0.1206^4-1 +0.0433(yt_i -6)+ -0.0914(yt-i-9)+ + 0.0914(yt-i- 13)+, 
respectively. 

As can be seen from the model checking below, the negative binomial INGARCH model seems 
to outperform the Poisson-based models. This could be explained by the over-dispersion exhibited 
by the data, since the mean and variance are 9.91 and 32.84, respectively. To this end, we fit 
the nonlinear negative binomial models and select the number of knots by minimizing the AlC. It 
turns out that the AIC value of a 1-knot model is the second smallest among all the candidates, 
with 2674.69 compared to the smallest value 2674.04, which is attained by the negative binomial 
INGARCH model fitted above. The fitted 1-knot negative binomial nonlinear model is given by 
Yt\Ft-i ~ NB(8,pt), where Xt = 8(1 -pt)M follows 

Xt = 0.4931 + 0.8444Xt_i + 0.0903yt~i + 0.0603(^4-1 - 9)+. 
(0.2559) (0.0350) (0.0412) (0.0546) 

Here the locations of knots for the nonlinear dynamic model are all estimated by the corresponding 
sample quantiles. We also tried estimating the knots by maximizing the likelihood, and in this 
application, the results by both methods ar e nearly identical. T he exponential autoregressive 
model ()4.14p is also applied to this dataset by Fokianos et al. (20091 ) and is given by 



At = (0.8303 + 7.030exp{-0.1675A?„i})At_i + 0.155iyt-i. 
(0.0232) (3.0732) (0.0592) (0.0218) 

To assess the adequacy of the fit by all of the above models, we will consider an array of 
graphical and quantitative diagnostic tools for time series, some of wh ich are specifically designed 
for time series of counts. Readers can refer to Davis et al. (2m± and I.Tung and Tremavne (201lh 
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1-knot NB model for Ericsson data 




ACF of standardized Pearson residuals 
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Figure 4: Top: Dotted curve represents the number of transactions of Ericsson stock, and the 
black curve is the fitted conditional mean process by 1-knot NB-based model; Bottom: ACF of the 
standardized Pearson residuals. 



for a comprehensive treatment of the tools. In our study, we first consider the standardized Pearson 
residuals et = (Ij — 'Ei{Yt\Ft-i)) / ^\ai{Yt\^t_i) which can be obtained by replacing the population 
quantities by their estimated counterparts. If the model is correctly specified, then the residuals 
{et} should be a white noise sequence with constant variance. It turns out that all the models 
considered above give very similar fitted conditional mean processes and the standardized Pearson 
residuals appear to be white. Figure H] displays the fitted result for the 1-knot negative binomial 
model. 

Another tool for model checking is through the probability integral transform (PIT). When 
the underlying distribution is continuous, it is well known that the PIT follows standard uniform 
distribution. However, if the underlying distribution is discrete, some adjustments are required and 
the so-called randomized PIT is therefore int roduced by pertu rbing the step fu nction characteristic 
of the CDF of discrete random variables (see Brockwell (2007 )). More recently, Czado et al. (20091 ) 



proposed a non-randomized version of PIT as an alternative adjustment. Since it usually gives the 
same conclusion for model checking, we do not provide the non-randomized version here. For any 
t, the randomized PIT is defined by 

ut := Ft{Yt -l) + vt [Ft{Yt) - Ft{Yt - 1)] , 

where {ft} is a sequence of iid uniform (0, 1) random variables, Ft{-) is the predictive cumulative 
distribution. In our situation, Ft{-) is simply the CDF of a Poisson or a negative binomial dis- 
tribution. If the model is correct, then ut is an iid sequence of uniform (0, 1) random variables. 
Jung and Tremavne (201ll ) reviewed several ways to depict this and we adopt their method in our 



study. To test if the PIT follows (0, 1) uniform distribution, the histograms of PIT from different 
models are plotted and a Kolmogorov-Smirnov test is carried out. The results are summarized in 
Figure El and the p- values are reported in Table El It can be seen that both of the two negative 
binomial-based models pass the PIT test, while none of the Poisson-based models does. This ob- 
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PIT of Poisson INGARCH 




PIT of NB-INGARCH 




PIT of 1-knot Poisson model 






Figure 5: Left: histograms of randomized PIT's for all of the models fitted to the Ericsson stock 
data; Right: QQ-plots of ut against standard uniform distribution for the corresponding models, 
where the straight line is the 45° line with zero intercept. 



servation could be explained, as mentioned above, by the over-dispersion phenomenon of the data. 



To measure th e power of predict ions by models, various scoring rules have been proposed in 
literature, see e.g., Czado et al. and lJung and Tremavne (201 ll ). Most of them are computed 

as the average of quantities related to predictions and take the form (n — 1)~^ Yl^=2 ^i^tiYt)) where 
Ft{-) is the CDF of the prediction distribution and s(-) denotes some scoring rule. In this paper we 
calculate three scoring rules: logarithmic score (LS), quadratic score (QS) and ranked probability 
score (RPS), as a basis for evaluating the rela tive performance of our fitted models. For definition 
of these scores, see I Jung and Tremavne f2nilh . Table El summarizes these scores for all of the fitted 
models. As seen from the table, most of the diagnostic tools favor the one-knot negative binomial 
model for the Ericsson data. 



2. Return times of extreme events of Goldman Sachs Group (GS) stock 

As a second example, we construct a time series based on daily log-returns of Goldman Sachs Group 
(GS) stock from May 4th, 1999 to March 16th, 2012. We first calculate the hitting times, n, T2, . . ., 
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Table 5: Quantitative model checking for Ericsson data 





iU^ lliYcIliiUULl 


n T In 1 o r~\T V-^ \ \ 
jj VOiUJLxi Ui JT 1 J. 


T S 






Poisson INGARCH 


-1433.19 


< 10-5 


3.1167 


-0.0576 


2.6883 


NB INGARCH 


-1332.02 


0.7386 


2.8958 


-0.0671 


2.6063 


1-knot Poisson model 


-1431.21 


< 10-5 


3.1123 


-0.0573 


2.6848 


2-l<not Poisson model 


-1431.08 


< 10-5 


3.1121 


-0.0575 


2.6843 


3-knot Poisson model 


-1430.58 


< 10-5 


3.1110 


-0.0580 


2.6779 


1-knot NB model 


-1331.34 


0.8494 


2.8942 


-0.0671 


2.6021 


Exp-auto model 


-1448.69 


< 10-5 


3.1504 


-0.0600 


2.6924 



for which the log-returns of GS stock falls outside the 0.05 and 0.95 quantiles of the data. The 
discrete time series of interest will be the return (or inter-arrival) times Yt = Tt — Tt-i- If the data 
are in fact iid, or do not exhibit clustering of large values, then the l^'s shoul d be independent and 
geometrically distributed with probability of success p = 0.1 (jChang f20ld )l. Figure [6] plots the 



return times of the stock, and the ACF and histogram of the return times. Note that in order to 
ameliorate the visual effect of some extremely large observations, the time series is also plotted in 
the top right panel of Figure [6] on a reduced vertical scale, in which it is truncated at 80 and the 
five observations that are affected are depicted by solid triangles. 

To explore this time series, three models: the geometric INGARCH (negative binomial IN- 
GARCH (j4.8p with r = 1), and the 1-knot and 2-knot geometric-based models are fitted to the 
data. The number of knots for the nonlinear dynamic models is chosen by minimizing the AIC, 
and the locations of knots are estimated by maximizing the likelihood based on a grid search. In 
addition, the following constraint is imposed: there should be at least 30 observations in each of 
the regimes segmented by the knots in order to guarantee that there are sufficient observations 
to obtain quality estimates of the parameters. The sample quantile method for estimating knot 
locations did not perform as well. 

Since it follows from the definition of return times that It > 1 for any t, we use a version of 
the geometric distribution that counts the total number of trials, instead of only the failures. In 
particular, the fitted 1-knot geometric-based model is given by 1^ — l\Ft-i ~ Geom(pt), where 

Xt = 0.5042 + 0.4729Xt_i + 0.5271(yt_i - 1) - 0.0526(yt_i - 5)+, 

and the fitted 2-knot geometric-based model is 

Xt = 0.5414 + 0.4531Xt_i + 0.5469yt_i - 0.2333(yt_i - 9)+ + 0.2332(yt_i - 18)+, 

where Xt = (1 —pt)/pt- Notice that in both models, a + (3 \s very close to unity, i.e., the estimated 
parameters are close to the boundary of the parameter space. This is similar to the integrated 
GARCH (IGARCH) model in which a + f3 = 1. In our application, the mean of the time series of 
return times is about 10, while the variance is 1101. A simple simulation according to the fitted 
model yields the mean and median very close to those of the data, but the variance of the simulated 
data is extraordinarily large, which resembles the feature of the observed data. This is because, 
although the fitted models are still stationary, the parameters no longer satisfy the conditions 
specified in Theorem [J] that ensure a finite variance. 

It turns out that the geometric-based models fitted above are capable of capturing the high 
volatility part of the data. Their standardized Pearson residuals are also calculated and appear 
to be white. Results of the PIT test are depicted in Figure El and the prediction scores and the 
p- values of the PIT test are summarized in Table [6l Two Poisson-based models are also included 
for comparison, and as expected, they do not perform as well as the geometric-based models. 
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Figure 6: Top left: Return times of GS stock, the dashed horizontal line locates at 80; Top right: 
Return times truncated at 80 in order to ameliorate the visual effect of the five large observations 
that are represented by solid triangles; Bottom left: ACF of the return times; Bottom right: 
Histogram of the return times, where the curve overlaid is the density function of a geometric 
distribution with p = 0.1. 



Table 6: Quantitative model checking for GS return times 



Model 


log likelihood 


p- value of PIT 


LS 


QS 


RPS 


Poisson INGARCH 


-2681.06 


< 10-^ 


8.2842 


-0.0675 


4.1373 


Geom INGARCH 


-857.73 


0.2581 


2.6477 


-0.1436 


3.4100 


3-knot Poisson model 


-2670.33 


< IQ-^ 


8.2510 


-0.0693 


4.1400 


1-knot Geom model 


-857.58 


0.3988 


2.6472 


-0.1436 


3.4041 


2-knot Geom model 


-857.42 


0.2006 


2.6468 


-0.1435 


3.3939 
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PIT of 3-knot Poisson model 
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Figure 7: Left: histograms of randomized PIT's for the models fitted to GS return times; Right: 
QQ-plots of Ut against standard uniform distribution for the corresponding models, where the 
straight line is the 45° line with zero intercept. 
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Appendix A. Properties of the exponential family 

An important property of the one-parameter exponential family that is heavily used in this paper 
is the stochastic monotonicity. A random variable X is said to be stochastically smaller than a 
random variable Y (written as X <st Y) if F{x) > G{x) for all x, where F{x) and G(x) are the 



cumulative distribution functions of X and Y respectively. We refer readers to lYu (20091 ) for the 
related theory. 

Proposition 7. Suppose two random variables Y' and Y" follow distributions belonging to the 
one-parameter exponential family (j2.ip with the same A, h and /x, but with natural parameters r/' 
and 77" respectively. If r/' < r?", then Y' is stochastically smaller than Y" . 



Proof. Denote the probability density functions of Y' and Y" as p{y\r]') and p{y\r]") defined in (j2.1 
respectively. Then the log ratio of the two densities is 

= log ^(^1^') = log - Mv')}h{y) 



p{y\v") exp{r]"y - A{rj")}h{y) 

y{n'-v") + [A{ri")-A{v% 



which is apparently a concave function in y. So it follows from Definition 2 in lYu f2009l ) that Y 



is 



log concave relative to Y" , i.e., Y' <ic Y". Moreover, since A^rj) is increasing in r/, so lim^j^o Ku) — 
^ for contin uous p(y\r]), andp{0\r]')/p{0\r]") > 1 for discrete p(y|ry). Hence according 
to Theorem 1 in lVu (20091 ). Y' is stochastically smaller than Y", i.e., Y' <st Y". □ 

Denote Fx as the cumulative distribution function of piylr]) in (j2.ip with x = B{rj), and its 
inverse F~^{u) := inf{t > : Fx{t) > u} for u E [0, 1]. The result below provides a useful tool for 
the coupling technique employed to establish mixing conditions for the observation process. 

Proposition 8. Suppose that [/ is a uniform (0, 1) random variable, and define two random 
variables Y' and Y" as 



Y' = F;,\U) and Y" = F;,}{U), 
where x' = B{r]') and x" = B{r]"). Then E\Y' - Y"\ = \x- 



X" 



Proof. It follows from the construction of Y' and Y" that they follow the one-parameter exponential 
family (j2.ip with natural parameters r/' and r/" respectively, and EY' = x' , EY" = x" . If x' < x" , 
then Y' is stochastically smaller than Y" by virtue of Proposition[71 It follows that F~,^{9) < F~,}{6) 
for e e (0, 1), i.e., Y' < Y". This implies E\Y' - Y"\ = E{Y" - Y') = x" - x' . Similarly if x' > x" , 
then E\Y' - Y"\ = x' - x". Hence we have E\Y' - Y"\ = \x' - x"\. □ 

Appendix B. Proofs 
B.l. Proof of Proposition [1] 

It suffices to verify the two conditions formulated in IWu and Shao f2004l l. For any yo in the 



state space S, E\yo - /„(yo)| = Jq \yo - g{yo, Fy^,^{u))\du < yo + 5(0,0) + ayo + b Fy^^(u)du < 
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^(0, 0) + (1 + a + b)yo < oo. Next for a fixed xq G S, there exists a unique ijq sucli that xq = B{riQ) 
due to the strict monotonicity of B{r]). For any x > xq, there exists a unique r] > ijq such that 
X = B{r]) > B[r]Q) = xq. Hence by the contraction condition ()2.3p . we have 

E|Xi(x)-Xi(xo)| = / \g{x,F-\u))-g{xo,F-^\u))\du 

Jo 

< a\x-xo\+b \F-^{u) - F-^^{u)\du. (5.1) 
Jo 

It follows from x > xq and Proposition [7] that for any u £ (0, 1), F~^^{u) < F^^(u). Therefore 

E|Xi(x) - Xi(xo)| < a{x-xo) + b{[ F~^{u)du-[ F~^^{u)du} 

Jo Jo 

= {a + b){x-xo). 

Similarly for x < xq, we have E|Xi(a;) — Xi{xq)\ < (a + b){xo — x). So for any x £ 5*, we have 
E|Xi(x) — Xi(xo)| < (a + b)\x — xq\. Now suppose E|X„(x) — X„(xo)| < (a + b)^\x — xo\, then 

E|X„+i(a-) - X„+i(a;o)| = E[E{|X„+i(X„(a;)) - X„+i(X„(a;o))| . . . , t/„}] 

< E{(a + &)|X„(a;)-X„(xo)|} 

< (a + 6)"+i|cc-a;o|. 

By induction, {Xt} is geometric moment contracting and as a result, tt is its unique stationary 
distribution. 

To show that Et^^i < oo, notice that by taking conditional expectation on both sides of ()2.4p . 
we have E(Xt|Xt_i) < g{0, 0) + (a + b)Xt^i. Inductively one can show that for any t > 1, 

E(X,|Xi) < i-ll^±^5(0,0) + ia + br'X^. 
I - [a + b) 

c c 
Since for any x £ 5, Xt (x) — > Xi ~ vr as t — )■ oo, in particular, Xt{0) — > Xi ~ tt, so by Theorem 

3.4 in iBillingslev (1993 ) we have 

E^Xi < liminf E(Xt|Xi = 0) < ^^^'^^ , < oo. 
t-S'oo 1 — [a + b) 

To prove (c), let {£,t,t > 1} be a sequence of independent uniform (0,1) random variables and 
independent of {Xt,t > 1}, then Yt = F^^(^j). Since {{Xt,^t),t > 1} is a stationary sequence if 
Xi ~ vr, so {Yt,t > 1} must also be a stationary process. 

B.2. Proof of Proposition [2] 

Define a sequence of functions {5^, A; > 1} in a way such that gi = g, and for > 2, ^^(x, yi, . . . , y^) = 
Qk-iigix, yfc)i 2/1) • • • ) Uk-i)- Then it follows from (|2.2p that for all t £ Z, 

Xt = gk{Xt-k,yt-i, ■ ■ ■ , Yt-k)- 

By virtue of the contraction condition ()2.3p . we have E\Xt — 51(0, Yj_i)| = E|yi(Xi_i, lt_i) — 
gi{0,Yt^i)\ < aEXt-i- By induction, it follows that for any k > 1, 



E\Xt-gk{0,Yt-u...,Yt^k)\ < EX, 



t-k- 
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^1 

Since ^^^i < oo, it follows that 5^(0, i^-i, . . . , Yt^^) — ^ -'^t) as A; — )■ oo. Hence there exists a mea- 
surable function ^oo : N(f = {(ni,n2, . . .),ni G Nq} — > [0,oo) such that Xt = (7oo(^t-i; ^4-2, • • •) 
almost surely, which proves (a). 

To prove (b), denote J-^^ = a{Yk, . . . ,Yi} for — oo < k < I < oo. Then the coefficients of 
absolute regularity of the stationary count process {It, t £ Z} are defined as 

/3(n) = E{ sup |P(^|^roo,o)-m)|}, 

A a 

jr^-^j n,oo 

where -^^^000 ~ i^jXi, Yq) ^-ij ■ • •} according to (a). Because the distribution of y^+i, . . .) 
given c7{Xi, ^-1) • • •} is the same as that of . . .) given X\ for ?i > 1, the coefficients 

of absolute regularity become 

/3(n) = E{ sup \P{A\a{X^,YQ,Y_^,..?^)-P{A)\] 

jr^-^j n.oo 

= E{ sup \P{A\X{)-P{A)W. (5.2) 

^^•^ n.oo 

Let B'^ be the <T-field in ]R°° generated by the cylinder sets, then we can rewrite the coefficients of 
absolute regularity as 

/3(n) = E| sup |p((y„,y„+i,...) G -p((y„,y„+i,...) G (5.3) 

We will provide an upper bound for (15. 3p by coupling two chains {(X^, y^), n G Z} and {(X^', y"), n G 
Z} defined on a common probability space. Assume that both chains start from the stationary dis- 
tribution, that is, X( ~ vr, X'l ~ vr and that X( is independent of X'^. Let {C/fc, /c G Z} as be an iid 
sequence of uniform (0, 1) random variables, and construct the chains as follows: 

-'^n = 9{^n~\^^^i (f^n-l)) , y" = F^^,{Un)- 

Since X( and X'{ are independent, so for any A G 

p((y:, KVi, . . .) G = p((y„, y„+i, ...)gA). 

Hence we have 

|p((y„, y„+i, . . .) G ^|Xi = x) - p((y„, y„+i, . . .) g a) | 
= |p((y,:, K+i, . . .) G = x) - p((y:, y:+i, . . .) G A\x[ = x) I 
< p((K,K+i,...)^(y:,y:+i,...)|x( = x). (5.4) 

Therefore the coefficients of absolute regularity are bounded by 

oo 

/3(n) < p((y^,y^+i,...) / (y:,y^Vi,...)) < j;p(y^+, / y^%). (5.5) 

fc=0 

Observe that the constructi on of the two chains agrees with the definition of geometric moment 
contraction (Definition 1 in Wu and Shao (20041 )). so it follows from Proposition [T] that E|X^ — 
Xli\ < (a + 6)" for all n. Then 

p(K/y:) = E{p(y^/y;|x„,X')} = E{p(|y^-y;i>i|x„,<')} 

< E{E\Y; - y:||x;,X;')} = EK - X:\ < (a + br- 
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Hence according to (j5.5p . the coefficients of absolute regularity satisfy /3(n) < Yl'h=o(^ + b)'^'^'' = 
(a + b)"'/(l — ( a + b)). Recall the well-known fact that /^-mixing implies strong mixing (e.g., 
Doukhan (19941 )). so {Yt,t > 1} is stationary and strongly mixing at geometric rate, in fact, it 
is ergodic. In particular, {Yt,t > 1} is an ergodic stationary process. It follows from Xt = 
goo(Yt^i,Yt^2, ■ ■ •) that {Xt,t > 1} is also ergodic. 

B.3. Proof of Proposition [3] 

The proof utilizes the classic Markov chain theory, see for example Mevn and Tweedie (200^ ) . ( 



a 



follows from the same argument as in the proof of Proposition [2j As for (b), for any fixed e > 0, 
define (j) as Lebesgue measure on [x*,oo), where x* = (g(0,0) + be)/{l — a), and let A be a set 
with 0(^) > 0. To prove the (/>— irreducible, we need to show that for any xi G S, there exists 
n> 1, such that P^''{xi,A) > 0. If xi < x* , then g{xi,e) < g{0,0) + axi + be < x*, which implies 
that (p^A n [g{xi, e), oo)) > 0. Because of the assumptions on the function g, and the fact that the 
distribution of Yi given Xi = xi has positive probability everywhere, so P{xi,A) > 0. On the other 
hand, if xi > x* , it is easy to see that g{xi,e/2) < g{xi,e) < xi. If (/(xi, e/2) < x*, then by the 
same argument above, we have P{xi^A) > 0. However, if g(xi,e/2) > x*, then ag(xi,e/2) + 6e < 
g(xi,e/2) — 5(0,0) < axi + be/2. Hence we have x* < g{xi,e/2) < xi — (6e)/(2a). By induction, 
there exists n > 1 such that g{xn,e/2) < xi — n{be)/{2a) < x* , where xt = g{xt-i, e/2) for 
t = 1, . . . , n. Since e > 0, and the function g is increasing in both coordinates, so P'^~^^{xi, A) > 0. 
Hence {Xt,t > 1} is 0— irreducible. 

We now show that {Xt,t > 1} is aperiodic, i.e., a 1^— irreducible Markov chain is said to be 
aperiodic if there exists a small set A with (p{A) > such that for any x E A, P{x, ^) > and 
P'^{x, A) > 0. Note that in the setting of the proposition, any compact set is a small set. So we take 
A = [x* , K] for some positive K large enough. For any xi £ A, from the proof of irreducibility, 
it is easy to see that P{xi,A) > 0. Similarly we have P'^{x,A) = P{X2 G A\Xq = x) > P{X2 £ 
A\Xi G A)P{Xi G A\Xo = x)>0. 

To check the drift condition, let V{x) = 1 + x. There exists 5 > 0, such that a + b < 1 — 5. For 
X > (5(0,0) +5)/{l-a-b-6), we have 

E{V{Xi)\Xo = x} = E{l + Xi\Xo = x) = l + E{g{x,Yo)\Xo = x} 

< 1 + g{0, 0) + (a + b)x < (1 - (5)(1 + x) = (1 - 6)V{x). 

Hence the drift condition holds by taking the small set A = [xq, {g{0,0) + 6} / {1 — a — b — 5)], which 
establishes the geometric ergodicity oi {Xt}. It is well known that a geometrically ergodic Markov 
chain starting from its stationary distribution is strongly mixing with geo metrically decaying rate, 
hence is an ergodic stationary time series (e.g., Meyn and Tweedie (20091 )). Denote > 1} as a 



sequence of iid uniform (0, 1) random variables, then it follows from Yt = F-^^{^t) that {Yt,t > 1} 
is stationary and ergodic. 

B.4. Proof of Theorem [1] 

We first show the identifiability and then establish the consistency result using LemmalU Through- 
out the proof, we assume that the process {{Yt,Xt),t £ Z} is in its stationary regime. Note that 
by assumption (Al), Xt{9) > Xq £ 1Z{B), which implies rit{9) > B^^{xg). So it follows from 
assumptions (A2) and (A4) that for any 9 £ Q, 

Ek{9) = E{YtB~\Xt{9)) - A{B~\Xt{9)))} 

< E{Yt sup B-\Xt{9))} - A{{B~\x*g)) < 00. 
6»ee 
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This implies E/+(0) < oo. Denote Mn{e) = J2t=i^t{0)/n, then MJO) ^ M(9) = E{yi?7i( , 
A{rii{9))} according to the extended mean ergodic theorem (see Bilhngsley (19951 ) pp. 284 and 



495). In order to prove the identifiabihty, we need to show that is the unique maximizer of 
M{0), that is, for any 6* G 6 \ {Oq}, M{e) - Af(6'o) < 0. First it follows from assumption (A5) that 
for any 9^0^ and all t, Pe,{Gt{e, 6^)) > 0, where Gt[e, Bo) = {Xt{e) / Xt{9o)}. Let G = Gt{e, 9o), 
then we have 

M{e)-Mi9o) = E[Yt{B-\Xt{e))-B-'{Xt{eo))} 

-{A{B-\Xt{e))) - A{B-\x,ieo)))}] 

= E[Xt{9o){B-\Xtie)) - B-^{Xt{eo))} 
-{A{B-HXt{9)))-A{B'HxM))}] 

= [ Xt{eo){B-\Xt{6)) - B-\Xt(eo))} 
Jg 

~{A{B-HXt{9))) - A{B-\XM))}dPe,. 

On the set G, there exists c G M between B~^{Xt{9)) and B^^{Xtieo)) such that A{B^^{Xt{9))) - 
A{B-'^{Xt{9o))) = B{c){B-^{Xt{e)) - B~'^{Xt{eo))} by the mean value theorem. It follows from 
A"{r]) > that A{r]) is strictly convex and c must be strictly between B~^{Xt{0)) and B~^{Xt{6Q)). 
So there exists ^ G M lying strictly between Xt{9) and Xt(0o) such that ^ = B{c). Therefore 

Mi9) - M(0o) = / iXti9o) - 0{B~HMO)) - B-\Xt{9o))}dPe,. 

JG 

Since B{7]) is strictly increasing, so {Xt{9Q) - i){B~^{Xt{9)) - B-^{Xt{9Q))} < in either of the 
two cases: Xt{9) < Xt{9o) and Xt{9) > Xt{9o). Hence M{9) - M{9o) < 0, for any 9 / ^o, which 
establishes the identifiability. To show the consistency, first note that by assumption (A4), we have 

Esnplt{9) = E{Ytsnp B-\Xt{9)) - inf AiB-\Xti9)))} 
See eee 

< E{YtsupB-^{Xt{9))} - A{B-^{x*)) < oo. 
eee 

The function /q in Lemma [1] can be defined as 

fe{y) = yiB-HgL{yo,y-i, ...))- ^(^^^(^^(yo, y-i, • • •))), 

where y = {yi,yQ,y-i, . . .). Hence it follows from assumption (A2) and Lemma [1] that M{9) is 
upper-semicontinuous and for any compact subset K <Z Q, limsup„_i.oo supgg^ Mn{9) < sup^g^ M{9). 
Take Uq as a local base of 9q and let [/ G Z^o be a neighborhood of 9q, then Lemma [T] can be applied 
to \ [/. Because u.s.c function attains its maximum on compact sets and M{9) < M{9q) for any 
9 ^ 9q, we have 

limsup sup Mn{9)< sup M{9) < M{9q), Pg^-a.s. (5.6) 

n^oo eee\u 6»ee\c/ 

Notice that for any 9 ^ U, Mn{9) < supg^Q^jj Mn{9). Let w G such that ()5.6p holds and 

M{9q) = \\m.n^oo Mn{9Q). For such uj, suppose 9n ^ U infinitely often, say, along a sequence 
denoted by N, then 

liminfM„(^„) < liminf^ M„(4) < limsup M„(a„) 

< limsup supM„(6') < limsupsupM„(0). (5.7) 
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However, according to (j5.6p . we have 

limsup sup Mni0) < sup M{e) < M{9o) = lim M„(6'o) < liminf M„(^n), 

n^oo 9ee\U 9ee\U n~^co n->oo 

which contradicts (15. 7p . Hence there exists a null-set Nu such that for all oj ^ A'^;/, 6n £ U for all 
n large enough. It follows by taking any set U gUq that converges to almost surely. 

B.5. Proof of Theorem H 

We define a linearized form of rit{9) as r/J(0) := r]t{9o) + {d — 9o)^'r]t, and the corresponding linearized 
log- likelihood function of l{6) as 

n n 

t=i t=i 

Let n = ^/^(^ — ^o)) then define 

n n n n 

t^l t^l t^l t=l 

n n 



71 



n 

+ YiMvt + u^n-'/%) - A{i^t) - u^n-'/'B{7jt)ijt}. (5.8) 

t=i 

Let St = Tr^l'^{Yt - B{7]t)}rit, then E{st\Tt.i) = n-^^^E[{Yt - B{r]t)}rit\Tt-i] = 0, so {st,t> 1} is 
a martingale difference sequence. Note that 

n 1 " 

Y^{stsJ\J^t^i) = -Y,n{Yt-B{vt)}\vt^\Tt^i] 
t=l t=l 

n. ^ — ' 



n 

t=i 



which converges almost surely to $7 by the mean ergodic theorem and assumption (A7). Moreover, 
for any e > 0, 

n 

^E{stsfl[|3^l>,]|J't_i} 
t=i 

n 

t=l 
n 

< l/nYvtVt'^myt - B{r]t)}'^l[i{Yt~BM}m\>M]\^t-i] 



t=i 



E[{Yi - B{r]i)}'^rjirii'^l[i!^Yt-B(rit)}m\>M]] as n ^ oo 
as M ^ 0. 
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Then it follows from the central limit theorem for martingale difference sequences that 

n 

^ St y ~ 7V(0, fi), as n — > oo, 
t=i 

where is evaluated at ^o- The other term in (jS.Sp by Taylor expansion is 

^ t=i t=i 

which is of the order of + op(l). Hence Rn{u) —u^V + ^u^i^u, where V ~ A^(0, fl). 

It then follows that argmin„i?Ji(n) argmin„{— + ^u'^Qu} = Q,~^V ~ A^(0, $7"^). 

For the rest of the proof, we show that the difference between Rn{u) := 1{0q) — 1{6q + un~^^'^) 
and Rn{u) is negligible as n grows large. By writing 6 = 9q + un'^^"^, the difference becomes 

n 

Ri{u)-Rn{u) = Y,{Yt-B{r]t)}{vt(.0)-7it-u^n-'/%} 
t=i 

n 

t=i 

-B{rit){vt{e) -Vt- u^n-^/^t}]. (5.9) 

By Taylor expansion, the first term in (15. 9p is l/(2n) — B(rit)}u'^rjt{Oi)u = \/{2n)u^ 

EILii^t - B{m)}m + EILii^t - B{m)]{m{0*t) - m]]u, where ei lies between 6 and ^o, and 
fjt = d'^rit/dOdO^ . Since 

^j2{Yt-B{r^t)}m ^ n{Yt-B{rit)}iit] 

= E[%E{yt-B(?7i)|j;_i}] =0, 

and ^/nY^^=i{yt — B{rjt)}{'riti(^t) ~ Vt} ~^ under the smoothness assumption, so the first term 
in (j5.9p converges to uniformly on [—K, K] for any K > 0. We now apply Taylor expansion to 
each component in the second term of (j5.9p . 

A{r)t{9)) = A{rjt) + u^n~^/^ B{7]t)^t 

+ l.u^{B{rjt{ei))U0l) + B\eiMei)rjMf}n, 

A{r]t + u^n-^/'^i)t) = A{r]t) + B{r]tW n'^/'^Tit + B' {c)'nt'nt'^u, 

rit{0) = Vt{Oo + un-^l'^) = i]t + i)tu^n-^/'^ + ^u^i^t{e^)u, 

where < c < u^n^^^'^tit, 01 and 02 both lie between and 9. Therefore the second term in (|5.9p 
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becomes 

n 

t=i 

n ^ 

t=l 

-Air,t) - B{r,t)u^n-'^% - B' {c)ritrit'^ u - B{r,t)^u^Vtie;)u] 

1 " 

-B'ic)j}tm^}]u, 

which converges to on a compact set of u under smoothness assumptions. So (j5.9p converges 
to as n — 7- oo, which implies that argmin„i2„(n) and argmin„i?Ji(u) have the same asymptotic 
distribution, i.e., 

argmin„i?„('u) A n~^V ~ iV(0,O~^). 

Note that argmin„i?„(ii) = argmax^ 1{9q + un~^^'^) = ^/n{9n — Oq), where 9n is the conditional 
maximum likelihood estimator. Hence 

VniOn-eo) N{0,n-^), as n^oo. 
B.6. Proof of Theorem [3] 

According to Theorems [1] and [21 it is sufficient to establish the identifiability of the model, that 
is, we need to verify assumption (A5). Suppose for some t £ Z, Xt{6) = Xt{Oo), P^Q-a.s, then 
5 + aXt-i{9) + PYt-i = 5o + aoXt-i{0o) + ^o^t-i- It follows from ([33]) that 

^ OO ^ oo 

(/3 - Po)Yt-i ^So-S + ao{-^ + V a^oYt-k-2) - a{- + /3 V a'^Ft^fc^a) • 

1 — an — ' 1 — a ^ — ' 

fe=0 k=0 

If /? / (3o, then Yt-i £ span{yf_2, Yt-3, . . .} which contradicts the fact that Var(Y't_i|7"t_2) > 0. So 
P must be the same as /3o- Similarly one can show that a = oq and 6 = Sq, which implies 6 = Oq. 
Hence the model is identifiable. 



B.7. Proof of Remark H 

The most difficult case is the derivative with respect to ^2 = « and we only give its proof, since the 
arguments for 6 and /? are similar. First note that 

where dB{rii)/da = 5/{\ — a)"^ + jS XlfeLi ka^^^Y^j^. Then on account of stationarity, one can show 
that 



00 



j-^ ' 1 — a(a + p) f— ' 



^ ^' k=l k=l 

where ii = EYt< 00. Hence E[B'(r/i(6lo)){(9??i(6'o)/9a}^] < cx) if 7y(0) < 00. 
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B.8. Proof of Proposition [5] 

The proof considers two separate cases: q = 1 and q > 1, since they require different methods to 
construct the state space. 

1. q = 1: without loss of generahty we consider p = 2. Denote Xt = (At,Aj+i), then Xj is a 
Markov chain. Note that Xt > X* = (5/(1 — ai — 02). X^ can be constructed by iteratively 
imposing the random function fu, u £ (0, 1), 

: [A*,oo) X [A*,oo) — > [A*, 00) x [A*, 00) 

x=(Ai,A2) ^ (A2,(5 + aiA2 + a2Ai+/3F-i(n)). 

For any x = {xi,X2),y = (yi,y2) in the state space S = [A*, 00) x [A*, 00), define metric p as 
/o(x, y) = — yi\ + 1021x2 — y2\, where > 0, i = 1,2 and 'Wi,W2 are to be decided. Let 

xi = (A?, A^) := (A*, A*), then for any x = (Ai, A2) we have 

Ep(Xi(x),Xi(xi)) = / p(/„(x),/„(xi))du 

Jo 

= a2W2\Xi - Xi\ + {wi + W2{ai + b)}\X2 - Xl\, 

where the last equation holds because A^ > A*. Therefore it is sufficient to find an r G (0, 1) 
and strictly positive {'Wi,W2) such that 

Ep(Xi(x),Xi(xi)) < r/)(x,xi) = r{«;i|Ai - A?| + «;2|A2 - A^|}. 

This can be obtained if the equation r^ — (oi+6)r— 02 = yields a root r+ = "■'^^^~^V('^^+^'i +^^ 2 ^ 
1. It can be shown that under ai + 02 + (3 < 1 the root r+ G (0, 1). Note that the choice of 
{wi,'W2) is not unique. 

2. g > 1: without loss of generality we consider the INGARCH(2,2) model. Define a Markov 
chain X^ = (Yt, Xt, Xt+i), then the chain can be obtained by defining the iterated random 
functions /„ : Zq x [A*, 00) x [A*, 00) — > Zq x [A*, 00) x [A*, 00) as /(x) = /(n,Ai,A2) = 

A2,(5+aiA2+a2Ai+/3iF^^(n)+/32n), where A* = 6/{l-ai-a2) andu € (0,1). Note 
that we cannot define Xj in the same way as in the first case, since otherwise it contradicts the 
independence assumption of {ut} sequence. Define the metric /> on = Zq x [A*, 00) x [A*, 00) 
as p(x,y) = Yd=i'^i\^i - y*!' where x = ixi)^^^,y = {yi)1^i and Wi > = 1,2,3. Take 
xi = (no. A?, A2) := (0, A*, A*), then for any x = (n, Ai, A2), we have 

Ep(Xi(x),Xi(xi)) = [' \fu{^) - fu{^i)\du 

Jo 

= l32W3\n — + W3a2\Xi — Xi\ 

+{wi +W2 + (ai + A)w^3}|A2 - A^|. 

Similarly to the first case, one needs to solve the inequality 

{02 + I32)iwi+W2) < [r - (ai + f3i)]{a2 + f32)w3 
< r{wi + W2)[r - {ai + Pi)] 

for an r G (0, 1) and a strictly positive triple {wi, W2, W3). This can be achieved if ai + 02 + 
/3i + f32 < 1, which implies the quadratic equation — [ai + /3i)r — [02 + (^2) = has a root 
r+ G (0, 1). The result hence follows by a simple induction. 
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B.9. Proof of Theorem |4] 



According to Theorem [2l we only need to establish the identifiability of the model. Similar to 
the proof of Theorem [3l one can demonstrate that if Xt{9) = Xt{9Q), Pg^-a.s. for some t, where 
6*0 = ((5o,ao,/3o,/3i,o, • • • ,Pk,o), then 

K 

(/3 - Po)Yt-i + Y^ih - f3k,o){Yt-i - a)+ 

k=l 

= 60-6 + aoXt^iiOo) - aXt^iiO) G a{Yt^2,Yt^z, ■ ■ ■}■ 

It follows that /3 = fSo and f3 = /3fc,0; k = 1, . . . , K . Similarly one can show that 6 = 60 and a = oq, 
hence 9 = Oq which verifies the identifiability of the model. 
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